Goto

Collaborating Authors

 communication graph






056e8e9c8ca9929cb6cf198952bf1dbb-Supplemental-Conference.pdf

Neural Information Processing Systems

This search does not affect the computational complexity, which is O(νnDE +SE) for agent n that computes DE parallel consensus steps and goes over a listofSE actionprofiles. Intuitively,wewouldneedE KN tofindtheoptimalactionprofile even with no noise, which creates delays where agents have to wait for their average reward to go abovetheirλn. In the multitasking robots game, if agent n has Ren = 0, then theoptimalactionprofilea e hastosatisfya e,m = nforallm. Ifλisasafemarginawayfromthe boundary of C(G), then most agents will have Ren = 0 most of the time. Hence, their performance depends on the best action profile in SE.


Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits

Neural Information Processing Systems

We study agents communicating over an underlying network by exchanging messages, in order to optimize their individual regret in a common nonstochastic multi-armed bandit problem. We derive regret minimization algorithms that guarantee for each agent $v$ an individual expected regret of $\widetilde{O}\left(\sqrt{\left(1+\frac{K}{\left|\mathcal{N}\left(v\right)\right|}\right)T}\right)$, where $T$ is the number of time steps, $K$ is the number of actions and $\mathcal{N}\left(v\right)$ is the set of neighbors of agent $v$ in the communication graph. We present algorithms both for the case that the communication graph is known to all the agents, and for the case that the graph is unknown. When the graph is unknown, each agent knows only the set of its neighbors and an upper bound on the total number of agents. The individual regret between the models differs only by a logarithmic factor.


Neurosymbolic Transformers for Multi-Agent Communication

Neural Information Processing Systems

We study the problem of inferring communication structures that can solve cooperative multi-agent planning problems while minimizing the amount of communication. We quantify the amount of communication as the maximum degree of the communication graph; this metric captures settings where agents have limited bandwidth. Minimizing communication is challenging due to the combinatorial nature of both the decision space and the objective; for instance, we cannot solve this problem by training neural networks using gradient descent. We propose a novel algorithm that synthesizes a control policy that combines a programmatic communication policy used to generate the communication graph with a transformer policy network used to choose actions. Our algorithm first trains the transformer policy, which implicitly generates a soft communication graph; then, it synthesizes a programmatic communication policy that hardens this graph, forming a neurosymbolic transformer. Our experiments demonstrate how our approach can synthesize policies that generate low-degree communication graphs while maintaining near-optimal performance.


MASPRM: Multi-Agent System Process Reward Model

Yazdani, Milad, Mostajabdaveh, Mahdi, Zhou, Zirui, Xiong, Ying

arXiv.org Artificial Intelligence

Practical deployment of Multi-Agent Systems (MAS) demands strong test-time performance, motivating methods that guide inference-time search and selectively spend compute to improve quality. We present the Multi-Agent System Process Reward Model (MASPRM). It assigns per-action, per-agent values to partial inter-agent transcripts and acts as an inference-time controller. MASPRM is trained from multi-agent Monte Carlo Tree Search (MCTS) rollouts without requiring step-level human annotations, by propagating returns to local targets. At inference, MASPRM guides step-level beam search and MCTS, focusing computation on promising branches and pruning early. On GSM8K and MATH, MASPRM-guided decoding with an outcome reward model (ORM) applied to the final answer, improves exact match (EM) over a single straight-through MAS pass by $+30.7$ and $+22.9$ points, respectively. A MASPRM trained on GSM8K transfers zero-shot to MATH without retraining, adding $8.4$ EM points at the same budget. MASPRM is a plug-in value model that estimates per-agent progress and complements verifier-style decoders, enabling more reliable, compute-aware multi-agent reasoning. Code: https://github.com/milad1378yz/MASPRM


Where2comm: Communication-Efficient

Neural Information Processing Systems

Multi-agent collaborative perception could significantly upgrade the perception performance by enabling agents to share complementary information with each other through communication. It inevitably results in a fundamental trade-off between perception performance and communication bandwidth.